==13372== LLi misses: 2,158
==13372== I1 miss rate: 0.01%
==13372== LLi miss rate: 0.01%
==13372==
==13372== D refs: 5,400,581 (2,774,345 rd + 2,626,236 wr)
==13372== D1 misses: 273,218 ( 219,462 rd + 53,756 wr)
==13372== LLd misses: 36,267 ( 2,162 rd + 34,105 wr)
==13372== D1 miss rate: 5.1% ( 7.9% + 2.0% )
==13372== LLd miss rate: 0.7% ( 0.1% + 1.3% )
==13372==
==13372== LL refs: 275,672 ( 221,916 rd + 53,756 wr)
==13372== LL misses: 38,425 ( 4,320 rd + 34,105 wr)
==13372== LL miss rate: 0.1% ( 0.0% + 1.3% )
==13372==
==13372== Branches: 3,008,198 (3,006,105 cond + 2,093 ind)
==13372== Mispredicts: 315,772 ( 315,198 cond + 574 ind)
==13372== Mispred rate: 10.5% ( 10.5% + 27.4% )
Let's break this up some:
==13372== I refs: 25,733,614
==13372== I1 misses: 2,454
==13372== LLi misses: 2,158
==13372== I1 miss rate: 0.01%
==13372== LLi miss rate: 0.01%
The first section details our instruction cache behavior. I refs:
25,733,614 tells us that the standard program executed 25,733,614
instructions in all. This is often a useful comparison between closely
related implementations, as we'll see here in a bit. Recall that
cachegrind simulates a machine with two levels of instruction and
data caching, the first level of caching being referred to as I1 or D1 for
instruction or data caches and the last level cache being prefixed with
LL. Here, we see the first and last level instruction caches each missed
around 2,500 times during our 25 million instruction run. That
squares with how tiny our program is. The second section is as
follows:
==13372== D refs: 5,400,581 (2,774,345 rd + 2,626,236 wr)
==13372== D1 misses: 273,218 ( 219,462 rd + 53,756 wr)